Skip to content

feat(callcenter::transcode): outer ↔ inner ontology mapper + parallelbetrieb#309

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/transcode-foundry-mapper-L3DF0
Apr 30, 2026
Merged

feat(callcenter::transcode): outer ↔ inner ontology mapper + parallelbetrieb#309
AdaWorldAPI merged 1 commit into
mainfrom
claude/transcode-foundry-mapper-L3DF0

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Reusable Foundry-style outer ↔ inner ontology mapper under lance-graph-callcenter::transcode, plus the one deliberate transition bandaid (parallelbetrieb) for the MySQL ↔ DataFusion ↔ SPO ground-truth reconciliation.

Domain-agnostic — every module operates on whatever Ontology is handed in. No medcare, smb, or any other vertical lives under transcode/.

Why this is a callcenter submodule, not a sibling crate

Medcare-rs PR #73 (merged) framed lance-graph-callcenter as the canonical Foundry / supabase-realtime transcode crate. A sibling lance-graph-transcode would create a competing framing. These are the four reusable primitives + the one bandaid; they belong here, alongside ontology_dto and version_watcher.

What ships

Module Role Tests
transcode::zerocopy OuterColumn / OuterSchema / OwnedColumn / from_columns. Cheap-zerocopy lane: Vec<T>Buffer is O(1). Refuses undeclared columns at the boundary. 4
transcode::cam_pq_decode CamPqDecoder trait + PassthroughDecoder for CodecRoute::{Skip, Passthrough}. Codec math stays in lance_graph_contract::cam. 3
transcode::spo_filter SpoFilterTranslator: SQL filter terms → SpoLookup. Uses canonical lance_graph_contract::hash::fnv1a. 5
transcode::ontology_table OntologyTableProvider: DataFusion TableProvider over (Ontology, entity_type). Round-1 backs scan with MemTable; SpoStore reader is the next round. 4
transcode::parallelbetrieb DriftEvent + DriftKind + Reconciler trait. The one deliberate bandaid. Schema matches MedCareV2's C# DriftEvent.ToJson() so both sides feed one dashboard. 10

Total: 26 tests, all passing.

Outer ↔ inner ontology framing

┌─────────────────────────────────────────────────────────────────┐
│  Outer ontology  (already in crate::ontology_dto)               │
│  Schema · LinkSpec · ActionSpec · Locale · Label · OntologyDto  │
│  ─ what every Foundry / supabase / PostgREST consumer sees      │
└──────────────────────┬──────────────────────────────────────────┘
                       │  transcode/  (this PR)
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  Inner ontology  (BindSpace + SPO triple store)                 │
│  FingerprintColumns · CAM-PQ codes · SchemaExpander triples     │
│  ─ what the membrane actually stores                            │
└─────────────────────────────────────────────────────────────────┘

The transcode subtree is the mapper. Five files; ~1 100 LOC. Uses only primitives the rest of the workspace already exposes — no new dep on the bgz-tensor / cognitive-shader-driver internals.

Why parallelbetrieb is in this PR (and labelled as transitional)

Every other transcode/ module should still make sense in five years. parallelbetrieb is different by design: it's the MySQL ground-truth reconciler that runs in F1 → F4 to prove the new substrate is correct. The module's doc-comment is explicit:

Even at F5 the reconciler stays — MySQL is permanent — but its mode shifts from "consensus required for any commit" to "background witness that emits drift events when something diverges". The bandaid framing is for the parallel-evaluation overhead, not for the witness itself.

The hard rules are spelled out in the module doc:

  • No Foundry primitive in this module. If the type is reusable beyond parallelbetrieb, it goes in a sibling module.
  • No silent reconciliation. Canonicaliser must collapse equivalents (NULL vs 0 → false) before comparing; never agree-to-disagree.

What this PR does NOT introduce

  • No duplicate of ontology_dto. The DTO surface stays canonical; transcode/mod.rs re-exports OntologyDto / EntityTypeDto / Locale / Label / SchemaExpander from one path so consumers reach the whole transcode surface from one import.
  • No duplicate of version_watcher. Realtime fan-out belongs to LanceVersionWatcher. transcode does not introduce a second channel primitive.
  • No new HTTP / WS deps. Realtime transport adapters live in the consumer-facing serve feature.

Cargo / features

  • Adds async-trait = "0.1" (small, no transitive deps).
  • transcode::zerocopy + transcode::ontology_table compile under query-lite or persist (both pull in arrow); the others are always-on.
  • transcode::ontology_table is gated on query-lite because it needs datafusion::TableProvider.

Verified

  • cargo check: clean across 6 feature combos{default, persist, query-lite, audit-log, query-lite+persist+audit-log, auth-rls-lite+query-lite}
  • cargo test transcode::: 26 passed, 0 failed
  • cargo clippy: zero transcode warnings across the same feature combos
  • rustfmt --check: clean on all 6 new files

Files changed

  • crates/lance-graph-callcenter/Cargo.toml (+5 lines: async-trait dep)
  • crates/lance-graph-callcenter/src/lib.rs (+8 lines: pub mod transcode;)
  • crates/lance-graph-callcenter/src/transcode/mod.rs (new, 78 lines)
  • crates/lance-graph-callcenter/src/transcode/zerocopy.rs (new, ~360 lines)
  • crates/lance-graph-callcenter/src/transcode/cam_pq_decode.rs (new, ~135 lines)
  • crates/lance-graph-callcenter/src/transcode/spo_filter.rs (new, ~165 lines)
  • crates/lance-graph-callcenter/src/transcode/ontology_table.rs (new, ~185 lines)
  • crates/lance-graph-callcenter/src/transcode/parallelbetrieb.rs (new, ~265 lines)

Generated by Claude Code


Generated by Claude Code

…betrieb

Reusable Foundry-style mapper between the wire-shape DTO surface
(already in `ontology_dto`) and the inner SoA / SPO substrate.
Domain-agnostic — every module operates on whatever `Ontology` is
handed in. No medcare or smb specifics live under transcode/.

## Modules (all under crate::transcode)

  zerocopy        OuterColumn / OuterSchema / OwnedColumn /
                  from_columns. The cheap-zerocopy lane: Vec<T> →
                  Buffer is O(1) reinterpretation. Refuses
                  undeclared columns at the boundary.
  cam_pq_decode   CamPqDecoder trait + PassthroughDecoder for
                  CodecRoute::{Skip, Passthrough}. The codec math
                  itself stays in lance_graph_contract::cam.
  spo_filter      SpoFilterTranslator: SQL filter terms →
                  SpoLookup. Domain-agnostic; uses canonical
                  lance_graph_contract::hash::fnv1a for predicate
                  fingerprints.
  ontology_table  OntologyTableProvider: DataFusion TableProvider
                  over (Ontology, entity_type). Round-1 backs scan
                  with MemTable; SpoStore reader is the next round.
  parallelbetrieb DriftEvent + DriftKind + Reconciler trait. The
                  ONE deliberate transition bandaid: MySQL ↔
                  DataFusion ↔ SPO ground-truth reconciliation.
                  Schema matches MedCareV2's C# DriftEvent.ToJson()
                  so both sides feed one dashboard.

## Why a submodule, not a sibling crate

PR #73 on medcare-rs explicitly framed `lance-graph-callcenter` as
the Foundry / supabase-realtime transcode crate. A sibling crate
would create a competing framing. These are the four reusable
primitives + the one bandaid; they belong here.

## What this does NOT introduce

  - No duplicate of ontology_dto. The DTO surface stays canonical;
    transcode re-exports OntologyDto / EntityTypeDto / Locale /
    Label / SchemaExpander from one path so consumers reach the
    whole transcode surface from one import.
  - No duplicate of version_watcher. Realtime fan-out belongs to
    LanceVersionWatcher; transcode does not introduce a second
    channel primitive.
  - No new HTTP / WS deps. Realtime transport adapters live in
    the consumer-facing serve feature.

## Cargo / features

  - Adds `async-trait = "0.1"` (small, no transitive deps).
  - transcode/zerocopy + transcode/ontology_table compile under
    `query-lite` or `persist` (both pull in arrow); the others
    are always-on.

## Verified

  - cargo check: clean across {default, persist, query-lite,
    audit-log, query-lite+persist+audit-log,
    auth-rls-lite+query-lite}
  - cargo test transcode::: 26 passed, 0 failed
  - clippy: zero transcode warnings across the same feature combos
  - rustfmt --check: clean on all 6 new files
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a60c1eaa9e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +263 to +267
Some((_, slot_opt)) => slot_opt
.take()
.ok_or_else(|| TranscodeError::MissingColumn(soa_col.name.to_string()))?,
None => return Err(TranscodeError::MissingColumn(soa_col.name.to_string())),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow optional ontology columns to be omitted

from_columns currently raises MissingColumn for every schema column that is not present in body_columns, regardless of whether the property is optional. For ontologies with nullable fields (for example krankenkasse), omitting that column or trying to represent all-null values fails the batch build, which makes optional fields effectively required at runtime. This breaks ingestion/query paths for sparse records and should instead synthesize null arrays for optional properties (while still erroring on required ones).

Useful? React with 👍 / 👎.

Comment on lines +79 to +82
let id = entity_type_id(self.ontology, s);
if id != 0 {
out.entity_type_id = Some(id);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve impossible entity_type filters in SPO lookup

When translating entity_type = 'UnknownType', the code computes ID 0 and then drops the predicate entirely, leaving entity_type_id unset. Any caller that relies on SpoLookup alone will treat this as an unconstrained lookup and can return rows instead of the SQL-correct empty result set. Unknown entity types should be encoded as an impossible constraint (or explicit no-match state), not silently removed.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit c3bff0c into main Apr 30, 2026
1 of 5 checks passed
AdaWorldAPI added a commit that referenced this pull request Apr 30, 2026
                  partial writes + CachedOntology + route validation

Addresses the five concrete gaps the brutally-honest review on #309
called out:

  1. arrow_type_for_semantic no longer collapses everything to Utf8.
     Currency → Float32, Date(_) → Date32, CustomerId / InvoiceNumber
     → UInt64. DataFusion can now do real numeric / temporal predicate
     pushdown on those columns. The remaining semantic types stay
     Utf8 by deliberate choice — round 3 may pivot specific ones (Geo
     → struct{lat,lon}) when a consumer asks.

  2. CachedOntology helper extracted upstream. Bundles an Arc<Ontology>
     with eagerly-projected DTOs per Locale (De/En). Prevents the
     per-call OntologyDto::from_ontology rebuild that medcare-rs's
     MedcareOntology and smb-office-rs's session ontology both grew
     independently. One implementation, one bug surface.

  3. validate_route(route, ontology) added to parallelbetrieb. Parses
     /api/{entity_type}/{...} and asserts entity_type resolves to a
     declared Schema.name (case-insensitive). 4 tests covering
     accept-valid, reject-typo, reject-missing-prefix, reject-empty.
     Used as a pre-flight check for static route lists; the runner
     itself doesn't gate on it because typo routes are still genuine
     drift telemetry.

  4. from_columns_partial added for PATCH-style upserts. Allows
     omitted Optional / Free columns (filled with Arrow null arrays);
     still rejects missing Required columns and undeclared columns.
     The existing from_columns keeps its strict full-row contract.

  5. route_for_column now reads OuterColumn.codec_route directly,
     copied from the upstream PropertySpec.codec_route at schema
     derivation. Round-1 went through route_tensor which is
     calibrated for model-weight tensor names (q_proj, lm_head, ...)
     and would silently mis-classify document predicates. The
     contract's own field is now the source of truth — drift-by-
     construction impossible.

Adds 7 new tests (6 in transcode/ + 1 implicit in cam_pq_decode
where the existing test was rewritten to assert the new semantics):
  - cached_ontology_projects_every_locale_at_construction
  - cached_ontology_clones_are_arc_cheap
  - cached_ontology_inner_round_trips
  - validate_route_accepts_known_entity_type
  - validate_route_rejects_typo_entity_type
  - validate_route_rejects_missing_api_prefix
  - validate_route_rejects_empty_entity_segment
  - route_for_required_scalar_column_uses_property_spec_default
    (rewritten from route_for_scalar_columns_skips_codec; old test
    asserted the wrong thing — required scalars should default to
    Passthrough, not Skip)

Verified:
  cargo check across {default, query-lite, query-lite+audit-log+
    auth-rls-lite} — clean
  cargo test transcode:: → 33/33 passed (was 26/26 in #309; +7
    new + 1 rewritten)
  cargo clippy on the same combos — zero transcode warnings
  rustfmt --check — clean across all 4 modified files
AdaWorldAPI added a commit that referenced this pull request Apr 30, 2026
…ring

F1 (MySQL <-> SPO oracle parity) shipped via MedCareV2 PRs #1, #2, #3,
medcare-rs PR #71, and lance-graph PR #309. The vision doc still claimed
F1 was "the next concrete deliverable". Rewrite section 7 to: state F1
has shipped, describe the LanceProbe -> ParityWitness -> DriftSink flow,
name the contract DTO
(lance-graph-callcenter::transcode::parallelbetrieb::DriftEvent), list
F1's known gaps (no latency claims; in-memory ring buffer), and state
F2 RBAC+audit wiring (medcare-rs adopting RlsRewriter) as the next
posture. No other sections touched.
AdaWorldAPI added a commit that referenced this pull request Apr 30, 2026
Adds the missing reverse-direction helper: takes a stream of
ExpandedTriple (what Ontology::expand_entity returns) and
materialises a RecordBatch grouped by subject_label. This is the
Phase 5 (in #309's ROADMAP) / Phase 2-B (in the SQL-SPO bridge
plan) bridge that consumer code needs to roundtrip an entity row
through the SPO substrate.

## Why this shape (and not 'walk SpoStore::scan(lookup)')

The original Phase-2 plan-doc described 'walk SpoStore::scan(lookup)'
as the read path. SpoStore (in lance-graph proper) is
fingerprint-Hamming-indexed and doesn't expose a flat scan(lookup)
method — its API is per-verb (query_forward / query_reverse /
query_relation) and the FNV-1a fingerprint is one-way (subject can't
be reversed back to entity_id).

ExpandedTriple is the right input shape: it carries the canonical
subject_label (`entity:{type}:{id}`) so entity_id is recoverable,
and the contract crate already mints these via SchemaExpander.
Consumers wire SpoStore -> ExpandedTriple -> RecordBatch through
their own subject_id-aware reader; this helper does the second
step canonically.

## What ships

  triples_to_batch(soa, &[ExpandedTriple]) -> Result<RecordBatch>
    - Groups by subject_label (BTreeMap, lex sort, stable)
    - Parses entity_id from `entity:{type}:{id}`
    - Emits one row per subject; missing-required surfaces as
      MissingColumn error
    - Drops triples whose predicate isn't declared in the schema
      (BBB outer-view rule)
    - Rejects mixed entity_type via EntityTypeMismatch

  round1_lenient_schema(soa) -> SchemaRef
    - Round-1 helper: every body column emitted as nullable Utf8.
      The typed schema (Float32, Date32, etc.) applies on the
      from_columns / from_columns_partial path which has typed
      input. Round 3 adds typed-value reconstruction inside
      triples_to_batch.

  parse_entity_id_from_label(label, expected_type) -> Option<u64>
    - Private helper. Matches the canonical mint format.

Two new TranscodeError variants:
  EntityTypeMismatch { expected, got }
  BadSubjectLabel(String)

## Tests

5 prior + 7 new (zerocopy:: total 12, all pass under query-lite +
auth-rls-lite):
  triples_to_batch_produces_one_row_per_subject
  triples_to_batch_rejects_mixed_entity_types
  triples_to_batch_returns_empty_batch_for_empty_input
  triples_to_batch_drops_undeclared_predicates_silently
  triples_to_batch_rejects_missing_required_column
  triples_to_batch_subject_label_round_trip
  triples_to_batch_preserves_lex_subject_order

## What's deferred

- Typed value reconstruction (round 3): every body column emits
  Utf8 today; round 3 parses object_label according to
  semantic_type (Currency -> Float32, Date -> Date32, etc.).
- SpoStore reader proper: still needs a side-table mapping
  subject fingerprint -> entity_id. Consumer-side; tracked in
  `.claude/plans/sql-spo-ontology-bridge-v1.md`.

Verified: cargo check + cargo test transcode::zerocopy:: -> 12/12
pass under {query-lite, query-lite+auth-rls-lite}. Clippy clean,
fmt clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant